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ON  A  CERTAIN  PENALTY  METHOD  IN  OPTIMAL 
CONTROL  AND  DIFFERENTIAL  GAMES 


by 


Ronald  J.  Stern* 


ABSTRACT 

The  penalty  technique  introduced  in  [6]  is  ^piklied  to  linear- 
quadratic  optimal  control  probleias,  N-Person  noa>z«ro  sum  differential 
games,  efficient  point  problems  in  linear  control  problems  with  Mtltiple 
quadratic  criteria,  and  to  bicriterion  optimal  control  problems.  In  all 
these  cases  the  reason  for  applying  the  technique  is  to  overcome  the 
computational  difficulty  introduced  by  the  imposition  of  a  pointwlse 
magnitude  restraint  on  the  feasible  controls.  Additional  details  are 
available  in  [6]  -  [9]. 
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ON  A  CERTAIN  PENALTY  METHOD  IN  OPTIMAL 
CONTROL  AND  DIFFERENTIAL  GAMES 

1.  Introduction.   In  recent  works  a  new  penalty  technique  has 
been  employed  to  derive  procedures  for  computing  optimal  open  loop 
controls  for  two  person  zero  sum  linear-quadratic  differential  games 
[6]  and  N  person  non-zero  sum  linear-quadratic  differential  games  [7], 
The  method  also  has  been  applied  to  compute  open  loop  solutions  to 
bicriterion  control  problems  and  efficient  points  for  multicriteria 
problems  in  control  [S],  [9].   "Open  loop"  solutions,  it  will  be 
recalled,  are  solutions  which  are  not  of  the  feedback  type;  that  is, 
they  are  functions  of  time  only.   In  each  of  the  problems  mentioned 
above,  the  penalty  method  is  employed  to  overcome  computational  dif- 
ficulties which  arise  from  the  imposition  of  a  pointwise  magnitude 
restraint  on  the  feasible  controls. 

In  section  2  we  give  a  detailed  exposition  of  the  technique,  for 
the  case  of  a  general  linear-quadratic  optimal  control  problem  with 
pointwise  magnitude  restraints.   In  subsequent  sections  we  outline 
the  application  of  the  technique  to  the  various  problems  mentioned  in 
the  preceeding  paragraph,  by  referring  to  section  2. 

2.  Application  to  Linear-Quadratic  Optimal  Control.  Consider  the 
following  linear  system  of  differential  equations  in  r": 

(2.1)  X  =  A(t)x  +  B(t)u  (to  <  t  <  To) 

with  initial  condition 

(2.2)  x(to)  =  x^ 

Here  A(t)  and  B(t)  are  continuous  (mxm)  and  (mxs)  matrices  on  the  com- 
pact time  interval  [to.  To].   Feasible  controls  u  =  u(t)  are  Lebesmie- 
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measurable  R  -valued  functions  which  almost  everywhere  on  [t  ,T  ] 

o  o 

take  values  in  U,  the  closed  unit  ball  with  center  0  in  R^ .  We 

should  denote  this  class  of  feasible  controls  by  7/. 

For  each  u  e  Xt  a  unique  solution  to  (2.1)  -  (2.2)  is  determined 

by 

t 
(2.3)  x(t)  =  S(To,t)x  +  J  S(t,s)  B(s)u(s)ds 


t 
o 


where  S  is  the  fundamental  solution  of  x  =  A(t)x  (see  e.g.,  [l]. 
We  now  introduce  an  objective  function  of  quadratic  type: 
(2.3)  J(u)  =  (x(T^)    -f  .  W[x(T^)  -5  iy 

+  J   <^x(t)  -  G(t)x(t),Q(t)  [x(t)  -  G(t)x(t)]y  dt  -  J  ^<L(t),u(t^dt 


o 


Here  <  is  a  fixed  vector  in  r'",  W  is  a  constant  symmetric  (mxm)  matrix, 
x(t)  is  a  continuous  R  -  valued  junction  on  [t  ,T  ],  C(t)  is  a  continuous 


O'  O" 


(mxm)  matrix,  and  Q(t)  is  a  continuous  symmetric  (mxm)  matrix  on  Ct  ,T  ]. 

o  o 

x(t)  denotes  the  solution  of  (2.1)  -  (2.2)  corresponding  to  U. 

The  following  lemma  is  required.   The  proof,  which  is  omitted,  is  of 
a  computa tioral  nature  and  makes  us3  of  bounds  of  the  various  parameters 
in  (2.1)  -  (2.3). 

Lemma  2.1.  There  exists  M  >  0  such  that  T  -  t  <  N  implies  the  fol- 

——————  _____      Q  Q  > 

lowing ; 


(i)  J(u)  is  a  strictly  concave  functional  on  L^'°  (t  T  ) 

o*  o' ' 

the  space  of  measurable  functions  u  satisfying 

T 
o 

J   iu(t) I   dt  <  ",  where  | |  is  the  Euclidean  norm. 

t 
o 


If;   ar    , 


.nnon   nenNJ 


2  s 
(ii)  J(u)  is  bounded  above  over  L  '  (t  ,T  ). 

Now  we  introduce  the  following  pair  of  problems  ; 

P,       maximize  J(u) 


2  s 
subject  to  ueL  '  (t  ,T  ) 
J ^  o  o 


P„       maximize  J(u) 

subject  to  ue  LM  . 
In  view  of  Lemma  2.1,  the  condition 


(2.4)   — ^  J(u  +  sv)  1     =0   for  all  ve  L^'®(t  ,T  ) 
^   '         ^      '  ie=o  ^00 

de 


is  sufficient  for  u  to  be  the  (unique)  solution  of  P^ .   (2.4)  gives  rise 
to  the  following  integral  equation: 

* 


(2.5)   G(s)   =  B  (s)  S  (T^,s)  W  [x(T^)  -  f  ] 
T 
+  J  B*(s)  S*(t,s)  C*(t)  Q(t)  [x(t)  -  x(t)]  dt. 


Here  x(t)    is   expressed   by   (2.3)    in   terms   of  u(t).     We  write   (2.5)   as 
follows: 

(2.6)      G  =  aG 

where  A:  C°'^   Tt  ,T  ]  —^     C°'^  [t  ,T  ]  (Here  we  denote  by  C°'^[t  ,T  ] 
o  o  o  o  o  o 

the  space  of  continuous  R  -valued  functions  on  [t  ,T  ]) .   It  is  easily 

shoen  (see  e.g.  [2]  that  if  T  -t   is  sufficiently  small  then  A  is  a  contrac.'ifon 

°  o  o  ■' 

We  therefore  have 

Theorem  2.1   There  exists  N  >  0  such  that  T  -t  <  N  implies  P, 
_________   Q  Q     — c I 

has  a  unique  solution. 


t;  /} 
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This  solution  can  be  identified  as  the  uniform  limit  of  a  sequence 
of  successive  approximations  using  (2.5).  The  case  for  problem  P„  is 
not  as  straightforward  we  now  turn  our  attention  ot  it.   To  this 


end,  we  first  introduce  a  new  payoff  functional. 

T 

(2.7)  /(u)   =  J(u)  -  /luCt)!^''  dt 

t 
o 

where  k  is  a  positive  integer.  The  general  penalty  method  discussed  in 
this  paper  uses  the  (computable)  optimal  payoffs  of  an  unconstrained  prob- 
lem with  (2.7)  as  payoff  in  order  to  approximate  the  optimal  payoff  of 
P-,  as  k  — >  <=°.  We  shall  denote  by  P   the  control  problem  with  payoff 

given  by  (2.7)  and  with  the  only  requirement  for  the  feasibility  of  u 

2k  s 
being  membership  in  L   '  (t  ,T  ). 

o  o 

We  defer  the  proof  of  the  following  theorem  until  later  in  this 

section. 

Theorem  2.2.   There  exists  N  >  0  such  that  T  -t  <  N  implies  P. 
3:;2IZZZZZII   °     °  ~      — k 

has  a  unique  solution. 

2 
We  now  define  the  following  map  of  R  into  R. 


^>  (v)  = 


■  £  I  I  ^  1  -l/2k 
1  f   V I  <  k 


otherwise 


We  also  define  the  following  payoff  functional: 

k  To 

(2.8)  J^  (u)   =  J(u)  -  J   /(u(t))  dt 

t 
o 


Note  that 

,        k        T  -t 


(2.9)   lAu)  -  j'P  (u)|  < 


for  any  u  e  L   '  (t  ,T  ) 

o  o 


Let  us  denote  the  solution  of  P,  by  u  .   Proof  of  the  following 
result  is  of  a  computational  nature  (using  Holder's  inequality)  and 
will  be  omitted. 

Lemma  2.2.   T  -  t  <  N  implies  that  there  is  a  real  Q  >  0  such 

■    o    o  ~   — ' — : 

that 

T 

sup  J   |u  (t) I    dt  <  Q. 
k 


t 
o 


Let  "SJ        be  the  set  of  vectors  in  R  with  Euclidean  length 


\J  ,  Lemma  2.2  implies 


<  1  -  k      and  let   P,  (•)  denote  the  Euclidean  distance  in  R  from 


T 

(2.10)   J    ^  ^  (/(t))  dt  <  -2- 
t    \  k 


2'is 
from  Theorem  2.2  and  (2.9)  we  have,  for  any  ije  L  ^  (t  ,T  ),  the 


following: 

,rk    .         1 

o  o 


^.  k.    T  -' 


(2.11)   J(u)  >  J    (u  )  - 
for  each  positive  integer  k. 

Now  let  P  denote  the  problem  with  objective  function  J(u) ,  but 

K. 


where  admissible  controls  are  those  Lebesque  measurable  functions  which 

almost  everywhere  on  [t  ,T  "i  are  valued  in  \j 
■'  o '  o 

We  now  prove  the  following  lemma: 


Lemma  2.3.  Let  T  -t  <  N.  Then  there  exists  a  real  D  >  0  such  that 

o  o  =     


the  following  is  true:   for  each  positive  integer  k  there  is  a  control 

^k 

u   such  that: 

T  ~  t 

(2.12)   J  (GS  >  J(u)  -  D(k"^^^  +  1  -  k"^/^S   -^ 

for  any  control  u  feasible  for  P.  . 


.]:,i'\    0  < 
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Proof.  Let  v  be  any  vector  in  R  .  Let 


V  = 


if   V  e  u^ 
k       V      otherwise 

^  FT 

By  a  simple  argument  we  have  that  u(t)  is  Lebesque  measurable  for 
any  Lebesaue  measurable  control  u(t). 

A  routine  calculation  (additional  details  are  to  be  found  in  [6]) 
yields  (2.12).   (2. 10) and  (2.11)  are  utilized  here. 

We  now  have  the  following: 

Theorem  2.3.   If  T  -t  <  N  then 

J(u  )   ^  sup  J(u) 

ue 

The  proof  is  found  (subject  to  minor  changes)  in  [?].   The  weak 
topology  is  employed  in  the  argument,  which  is  similar  in  spirit  to  a 
result  in  [s],  p.  209. 

We  turn  now  to  the  proof  of  Theorem  2.2. 

Proof  of  Theorem.  2.2.   By  arguments  similar  to  those  in  the  proof  of 


Lemma  2.1  we  have  negativity  of  the  second  Gateoux  differential  of  J  (u) 
if  T  -t   is  sufficiently  small.   (See  e.g.,  [2]  for  proof).  The  condi- 
tion  of  stationarity  (i.e.,  J  (u)  have  zero  first  Gateaux  derivative) 
is  then  both  a  necessary  and  sufficient  condition  for  optimality.   The 
condition  is  given  by  the  following  integral  equation  (see  [2]  for  similar 
equations): 


a .     '>?.'''•' 


•7- 


(2.13)      u(s)  +  2k|u(s)r  ^1(8) 


=  B*(s)  S*(T^,s)  W  Cx(T^)  -  ^  ] 

S 

+   J  B*(s)  S*(t,s)  C*(t)  Q(t)  [x(t)  -  x(t)]  dt, 


t 
o 


(2.13)  is  next  rewritten  as  follows: 


(2.14)  m''(u(s))  =  T^[M'^(a(s))] 

Ic     s       s 
Here  M  :   R   ►  R   is  given  by 

m'^(v)  =  2v  +  2k|vt^'''S. 
M  has  an  inverse  given  by 

M   (w)   = 


2+2k[r,  (!wl)] 


2k-2 


where  r,(|w|)  is  the  unique  real  proof  of  the  polynomial  2kx     +  2x 
lw| .  Note  that  the  expressability  of  (2.13)  in  the  form  (2.14)  is  de- 

pendent  upon  the  invertibility  of  M  . 

k  = 

Each  T   is  a  contraction  when  T  -t   is  small  enough,  say  N;  this 

by  Theorem  4.2  in  [6].   This  completes  the  proof  of  the  theorem. 

(2.13)  can  be  solved  computationally.   A  procedure  is  given  in 
section  5  of  [6].  This  procedure  circumvents  the  problem  that  r,  has 
no  explicit  form  when  k  >  2 . 

3.  Application  to  Differential  Games.   The  outline  presented  in 
this  section  summarizes  results  of  both  references  [6]  and  [?]. 


The  governing  dynamics  are  given  by  the  following  system  in  R 

N 

(3.1)  i  =  A(t)x  +  2  B.(t)  u .        (t  <  t  <  T  ) 

.  1  1      1  o=   =   o 

1=1 

with  initial  condition 

(3.2)  x(t^)  =  x^  . 

o     o 


m 


We   define  N  cost   functionals   of  quadratic   type; 


(3.3)      J.Cu^.u^,....,   V   =    <(>^(V    -   ^   i'  V-^V    -   ^  i^ 


T 
o 


I     Vi^^'^    '  ^i'^'^)^^'^^'    Qi<'>    L   x.(t)    -  C.(t)x(t)]\    dt 


t 
o 

T 
o 


J    <Vi(t),^.(t:)^ 


+  J     <v,.  (t),\t,(t:)>     dt,  i=l,2,...,N 

'o 


Assumptions  on  the  parameters  of  (3.1)  -  (3.3)  parallel  those  made  in 
section  2.  Admissible  controls  li.  for  the  ith  player  are  Lebesque 
measurable  functions  which  almost  everywhere  take  values  in  the  unit 
ball  of  R^. 

For  the  above  game,  denoted  G,  we  seek  an  open  loop  Nash  equilibrium; 
that  is,  a  vector  of  feasible  controls  (u, ,  u„,...,,u^)  such  that 

(3.4)  J.(a^,  n^,    ....,  u^)  >  J.(G^,  u^,  .  . .  ,    a._^,  u..  u.^^.-.-V 

for  any  admissible  u.,   i=l ,2 , . . . ,N. 

An  approximating  p^iame,  denoted  G,  ,  is  introduced,  for  k  a  positive 

integer.  The  payoff  functionals  for  this  differential  game  are 

T 
k  k  °        2k 

(3.5)  J.(u^,U2,,..,  u^)   =  J.  (u^,U2,...,u^)  -  J   |u.(t)|     dt 

o 
for  i  =  1,  2,. . . .N. 

An  analog  of  Theorem  2.2,  the  proof  of  which  may  be  found  in  [7], 
is  the  following: 

Theorem  3.1.   If  T  -t  is  sufficiently  small  then  G  has  a  unique 

•  '  —  o   o  ^  T 

open  loop  Nash  equilibrium. 


:>. 
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An  analog  of  Lemma  2.3  is  now  given: 

Lemma  3.1.   If  T  -t   is  sufficiently  small  then  there  exists  a  con- 
stant  D  >  0  such  that  the  following  is  true:   For  each  positive  integer 

k  there  is  a  vector  of  controls  (u, ,  u„ , u„)  feasible  for  G,  such 

12       N  k  


that 


(3.6)   J.(u^,  U2,...,u^)  >  JiC^i,  U2'--'-"i-r  "i'  Vl"--'V 

-D(k-^^2  ^  ^  _  ^-l/2k^  _  ^o-'^o 


k 

where  G,  is  the  game  with  the  same  payoffs  as  G  but  where  feasible  con- 
trols u.  are  Lebesque  measurable  and  are  valued  almost  everywhere  in  the 
ball  of  radius  k      Jji  R  . 

By  properties  of  the  weak  topology  it  is  shown  in  [?]  that  G  has 
an  open  loop  Nash  equilibrium  which  can  be  represented  as  the  weak  limit 
of  a  subsequence  of  (u.  ,  u„ ,  ...,  u».)  •   It  is  shown  there  that  the  equi- 
librium costs  of  G  are  computable  by  the  method  of  [6],  section  5. 

4.   Application  to  Mul ticriteria  Control 


(4.1)  i  =  A(t)  +  B(t)u    (t  <  t  <  T  ) 
with  initial  condition 

(4.2)  x(t^)  =  X  ^ 
and  n  criterion  functions 

(4.3)  f.(u)  =  <  x(T^)  -  5.,  W.rx(T^)  -  5.]  > 

T 

+  J  <  x.(t)  -  C.(t)x(t),  0.(t)Cx(t)  -  C.(t)x(t)]  >  dt 

o 

T 
o 

-  J  <  u(t),u(t)  >  dt,  (1<  i  <  n), 

t 
o 


c  ■ :     1  •  1'    .i")    ■•■ 
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where  assumptions  on  (4.1)  -  (4.3)  parallel  those  made  in  section  1. 

An  efficient  point  u  over  a  class  of  controls  rj  is  a  control 
u  £■  Q  such  that  for  no  other  u.  6  1^ 

f  (u)  ^  f^(u°)  1  <:  i  ^  n 

with  at  least  one  strict  inequality. 

The  class  of  controls  £•  in  the  above  definition  will  be  taken 

here  as  one  of  the  following  two  classes. 

T 
L  *  (t  T  )  -  the  space  of  controls  u  with     ju(t)|  dt  <  as,  where 

o 
I  j  denotes  the  Euclidean  norm. 

IL  -   the  class  of  measurable  controls  with  ]u(t)|  ^  1  a.e.  on 

[t  ,T  1. 
'-  o'  o' 

The  following  two  efficient  point  problems  will  be  studied  here: 

2  s 
E   Find  all  efficient  points  over  L  '  (t  ,T  ). 
1  '  '    ■  *^ — o  o 

E_   Find  all  efficient  points  over  [JL, . 

For  the  bicriterion  case  n  =  2  we  study  two  other  optimal  control 
problems 

M-   Maximize  minjf . (u) ,f^ (u) |  subject  to  u  f  L"*  (t  ,T  ) 

M   Maximize  hff .  (u)  ,f  „  (u)")  subject  to  uf  ttf, 

2 
where  h:   R  -♦  R  is  continuous  and  non-decreasing  in  each  of  its  argu- 

7 
ments  on  the  non-negative  orthant  R  ,  and  quasiconcave  over  the  interior 

2 
of  R.   We  further  assume  that 
+ 

|u(t)j  5  1  a.e.  on  [t^,T  1  =>     f,(u)  >  0,  i  =  1,2,  which  is  guaranteed, 
for  example,  if  W.  are  positive  definite  and  jx  |  is  sufficiently  large. 

We, observe  that  the  objective  function  of  M.  is  a  special  case 
of  the  objective  'fttBCftian.  h(x,y)  of  M^.   Other  examples  are 

h(x,y)  =  x^y   ;    g  >  0 
or 


.■  r.l.^; 


'.  -. -JMnr-l  ■ 
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6        R 
h(x,y)  =  CjX  1,  +  c^y  2;  t:i.C2,P^,32  >  0- 

(For  other  examples  see  e.g.,  TsT  p.  40). 

In  what  follows  let  the  vector 

a=   (a.  ,a^,  .  .  .  ,  ,a  )    satisfy 
1   2       n        ■' 

n 

(4.4)   £  Q'.  =  1  and  c  >  0   1  <  i  <  n. 
.  ,   1  1  "       "    " 

1=1 

For  each  such  awe  define  the  following  single  criterion  analog  of  E- : 

n  „ 

c^  J.   s 

P,  Maximize  S  o?.f.(u)  subject  to  ue  L  '  (t  ,T  ) 

1  .     ^       11    ■' o   o 

1=1 

Similarly  to  Theorem  2 . 1  we  have 

a 
Theorem  4.1.   If  T  -t   is  sufficiently  small  then  P,  has  a  unique 

"  ■  -   —  o  o  ' 1  ^^ — 

open  loop  solution. 

ex 
The  problems  E^  and  P^  are  related  by  following  Theorem. 

Theorem  4.2.   If  T  ~t   is  sufficiently  small 

'  — ■   o   o  — ^   ""  — — 

Then 

(i)   If  the  vector  a   satisfying  (4.4)  is  positive  (i.e.,  if  a.    >  0, 

a           01  2  s 

1  <  i  <  n) ,  then  the  solution  u   of  P,  is  an  efficient  point  over  L  '  (t  ,T  ) 
"     — —  1 ^ DO 

(ii)   If  u  is  an  efficient  point  over  L  '  (t  ,T  )  then  for  some  a 

O  Of 

satisfying  (4,4) ,  u   is  the  solution  of  P^o . 

Proof.   (i)  is  obvious.   (ii)  is  proved  as  in  the  finite  dimensional 

case,  see  e.g.,  [4],  Section  7,4. 

A  method  for  approximating  a  solution  of  P   is  given  by  Theorem  4.2 

ex 
namely,  u   is  the  uniform  limit  of  a  sequence  of  successive  approximations. 

We  now  define  a  new  problem: 


Po   Maximize  L     Q'.f.(u)  sublect  to  ue  U. 

Z       _    11    -^ 

1=1 


.(J    -i;."'!/. 


■     j</l 


■"•*     1  'i. 


•12- 


Although  \jL   is  not  a  compact  subset  of  L  '  (t  ,T  )  the  consistency 


o'  o' 


of  [P-]  is  guaranteed  by  the  following  Theorem. 


2 


a 


Theorem  4.3 .   If  T  -t   is  sufficiently  small  then  each  problem  P„ 


has  a  unique  solution, 


See  [s]  for  proof. 


The  relations  between  E  and  P.,  studied  in  Theorem  4.2,  hold  also 

a 
for  E-  and  P^. 

Theorem  4.4.   If  T  -t  is  sufficiently  small  then  (i)  If  the  vector 

zzmizziziiiii  —   °    °  - •^ 

a   satisfying  (4.4)  is  positive,  then  the  solution  of  P„  is  an  efficient 
point  over  it . 

(ii)   If_  u  is  an  efficient  point  over  U  then  u  is  the  solution  of 
P-  for  some  a     satisfying  (4.4). 

The  penalty  method  outlined  in  section  2  is  applicable  to  problem 

(X 

P 
*^2- 

The  following  results  are  proves  similarly  to  the  corresponding  results 

in  [3]. 

Lemma  4.1.   If  T  -t  <  N,  then  M,  and  M„  have  optimal  solutions,  and 
—  o   o      — — —  1  2  — *- 2 

at  least  one  solution  of  each  is  efficient. 

Theorem  4.5.  Let  T  -t  <  N. 
o   o 

Then  the  following  function  of  a 

hCf,^(u°'),  fjCii'^')] 

is  unimodal  on  [0,1J. 

Using  the  customary  search  techniques  for  finding  the  supremum  of 
a  unimodal  function,  Theorems  4.1,  4.2,  Lemma  4.1  and  Theorem  4,5  con- 
stitute a  procedure  for  approximately  solving  M^ ,  while  Theorems  4.3, 
4.4,  Lemma  4.1  and  Theorem  4.5  similarly  constitute  a  procedure  for 
approximately  solving  M_ . 
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